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Abstract 

To study mechanisms that cause the non- 
Gaussian nature of network traffic, we analyzed 
IP flow statistics. For greedy flows in particular, 
we investigated the hop counts between source 
and destination nodes, and classified applications 
by the port number. We found that the main 
fiows contributing to the non-Gaussian nature of 
network traffic were HTTP flows with relatively 
small hop counts compared with the average hop 
counts of all flows. 



1 Introduction 

Recently, it has been found that the characteris- 
tics of the marginal distribution of network traf- 
flc are crucial for modeling network traffic to 
evaluate performance^. That is, marginal dis- 
tributions are far from Gaussian and are skewed 
positively in many cases, and this nature is 
strongly correlated with network performance. It 
has also been found that this non-Gaussian na- 
ture of network traffic has a correlation with the 
heavy-tailedness of the per-time-block flow size 
distribution. That is, according to the power-law 
of the distribution, some flows send a tremen- 
dously large number of packets in a given short 
time while most other flows send a rather small 
number of packets As the nature of these 
greedy flows contributes to the non-Gaussian na- 
ture of network traffic, it is important to study 
their nature in detail. In this work, to investigate 
greedy flows, we studied hop counts and types of 
applications in them. 



2 Data 

We used the trace data from the MAWI traffic 
archive|Q] measured at sample point-B between 
September and November 2001. The line is a 
100-Mbps link with 18-Mbps CAR (committed 
access rate); it is one of the international lines 
of the WIDE project. All traces were measured 
during daily busy hours (14:00 - ) and contained 
about 2.9 ~ 3.0 million packets. For this study, 
we used one-way US-to-Japan traffic because the 
average amount of traffic is much larger than in 
the opposite direction^. For all traces, we cal- 
culated the average rate and skewness of traf- 
fic variability — time series of throughput us- 
ing the time interval of 0.1 s. Total time aver- 
age of variability for each trace varied from 6.02 
Mbps to 34.70 Mbps (ensemble average was 18.80 
Mbps) . The skewness of variability for each trace 
varied from -0.61 to 2.71 (ensemble average was 
0.68). We removed traces with skewness smaller 
than 0.4 because our goal was to investigate the 
characteristics of network traffic having a non- 
Gaussian nature. In total, we used 68 traces for 
this work. 



3 Per-time-block flow analysis 

We divided traces into time block Tj, where 
1 < i < M as illustrated in Figure I. Here, 
for all i, the length of Tj was set to time inter- 
val T. For each Tj, we define a flow fl-j{Ti), 
where 1 < j < Nt^ and Nt^ is the number of 



^To estimate hop counts, we needed to use traffic in 
both directions. 
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flows during Tj. Each flow fl_j (Ti) is deflned as 
having an identical combination of source IP ad- 
dress, destination IP address, source port, des- 
tination port, and protocol. The flow fl_j {Ti) 
should contain at least two packets during Tj . In 
this work, the length of r was set to 0.1 s. For 
each flow fl_j (Tj) (1 < j < Nj-.), we counted the 
number of packets Np {fLj (Ti)). Figure |^ shows 
the log-log complementary cumulative distribu- 
tion (LLCD) plots of Np{fLj{Ti)) for aU 
These were in good agreement with the power- 
law as demonstrated in j^. In this work, we 
focused on greedy flows — the tail part of the 
distribution. Here, we deflne a greedy flow as 
one whose Np {fl-j {Ti)) is larger than 20 (right 
side of the dashed line in Figure |2|), which cor- 
responds to throughput about 1 Mbps assuming 
the average packet size to be 700 bytes. 



10 




Figure 1: Diagram of per-time-block flow statis- 
tics. 



4.1 



Hop count estimation 

Estimation technique 



To study hop counts between two nodes from the 
given trace data, we used the TTL (time to live) 
fleld of an IP packet. As its value is decreased 
when an IP packet passes a router, we can es- 
timate hop counts between the source node and 
measuring point from the initial TTL value and 
the TTL value of the recieved packet. So, if we 
can obtain the hop counts from both the source 
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Figure 2: LLCD plots of Np {fLj {Ti)). 



and destination nodes to the measuring point, we 
can estimate the hop counts between these nodes 
0. One difficulty with this approach is that the 
initial TTL values depend on the operating sys- 
tem or network equipment such as routers (see 
for example). So first of all, we must deter- 
mine the initial TTL value of source nodes. In 
this work, we used the approach of passive OS 
fingerprinting to estimate hop counts as exactly 
as possible. This technique is based on the prin- 
ciple that every system has its own IP stack im- 
plementation. That is, we can detect systems ex- 
tremely accurately using some values recorded in 
IP packets they sent. More detailed information 
about passive OS fingerprinting can be found in 
[^. In this work, we modified the source code 
of p.0.f.[P and estimated the hop counts of each 
IP flow. In our study, we could estimate more 
than 10% of the systems for each trace. Here, 
we assume that we can regard statistics of these 
estimated flows as statistics of all flows. 



Here, we assumed that the routing paths for both 
directions were the same for convenience. 
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4.2 Hop counts of greedy flows 

For each flow fLj (Ti) defined in Sec. 3, we esti- 
mated flop counts hop {fl-j (Ti)) using the above 
technique. Figure ^(a) shows the relationship 
between hop{fl_j {T-)) and Np{fLj{Ti)) for a 
certain trace. Figure |3|(b) shows the histogram 
of hop {fl-j (Ti)) for the same trace. The hop 
counts of greedy flows (above the dashed line in 
(a)) can be considered to be smaller than those 
of all flows. 

Then we investigated the histogram of 
hop {fl-j {Ti)) for all traces. Figure ^ shows his- 
tograms of (a) all flows and (b) greedy flows. The 
average hop counts for greedy flows were smaller 
than those of all flows, and most greedy flows had 
relatively smaller hop counts. Actually, average 
hop counts were 19.85 for all flows and 17.92 for 
greedy flows (see dashed lines in Figure |^ . These 
results can be interpreted using the fact that the 
RTTs of flows with smaller hop counts tend to 
be smaller as demonstrated in [||], and TCP flows 
with smaller RTTs can make their window sizes 
larger following the mechanism of TCP flow con- 
trol. That is, TCP flows with smaller hop counts 
can make their window sizes larger, and be greed- 
ier. 
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Figure 3: (a) hop {fLj {Ti)) versus Np {fLj (T,)), 
(b) histogram of hop {fLj (Tj)). 



5 Breakdown of applications 

We investigated the breakdown of applications 
for each flow using the port numbers (Table ||). 
We can immediately see that the proportion of 
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Figure 4: Histogram of hop {fLj {Ti)) for (a) all 
flows, (b) greedy flows. 



HTTP was much larger among greedy flows than 
all flows. So it might be reasonable to assume 
that HTTP plays an important role in making 
greedy flows. We will focus on causal mech- 
anisms of why HTTP flows are greedier than 
other applications, and the relationship with hop 
counts in our next work. 



Table 1: Breakdown of applications 





All flows 


Greedy flows 


HTTP 


54% 


70% 


other TCP 


38% 


20% 


UDP 


7% 


6% 


other 


1% 


4% 



6 Summary 

To study mechanisms that cause the non- 
Gaussian nature of network traffic, we investi- 
gated the properties of greedy IP flows. Our 
main findings are as follows. (1) Hop counts of 
greedy flows were relatively smaller than those of 
all flows. (2) HTTP was the main application for 
greedy flows. Since the most popular application 
on the Internet today is server-client type file- 
transfer applications such as WWW, we believe 
that the results of this work suggest the ubiqui- 
tous existence of greedy flows, which causes the 
non-Gaussian nature of network traffic. 



draft: lEICE General Conference 2002, Tokyo, Japan 



Page 3 



References 



[1] Keita Fujii, Shigeki Goto, Correlation be- 
tween Hop Count and Packet Transfer 
Time, APAN/IWS2000, February 2000 

[2] Tatsuya Mori, Ryoichi Kawahara, A study 
on the marginal distribution of network traf- 
fic. lEICE Technical Report, IN 2001-107, 
pp. 1-7, 2001 

[3] Lance Spitzner, Know Your En- 
emy: Passive Fingerprinting, 
http : / /pr o j ect . honeynet . org / papers / finger /| 

[4] WIDE MAWI Working Group Traffic 
Archive 

http : / / tracer . csl . sony. co.jp/mawi/ 

[5] passive OS fingerprinting tool p.O.f. 
http: / /www. stearns.org/pOf/ 



draft: lEICE General Conference 2002, Tokyo, Japan 



Page 4 



